Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
نویسندگان
چکیده
In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. this paper, combine best both we propose use CTC-Prefix-Score during S2S decoding. Hereby, beam search, paths that are invalid according CTC confidence matrix penalised. Our network architecture is composed Convolutional Neural Network (CNN) visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) encoder, and decoder Transformer with inserted mutual attention layers. The confidences computed on encoder while only used character-wise We evaluate setup three HTR data sets: IAM, Rimes, StAZH. On achieve competitive Character Error Rate (CER) 2.95% when pretraining our model synthetic including character-based language contemporary English. Compared other state-of-the-art requires about 10–20 times less parameters. Access shared implementations via link GitHub .
منابع مشابه
Sequence to Sequence Training of CTC-RNNs with Partial Windowing
Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including endto-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinde...
متن کاملSequence to sequence learning for unconstrained scene text recognition
In this work we present a state-of-the-art approach for unconstrained natural scene text recognition. We propose a cascade approach that incorporates a convolutional neural network (CNN) architecture followed by a long short term memory model (LSTM). The CNN learns visual features for the characters and uses them with a softmax layer to detect sequence of characters. While the CNN gives very go...
متن کاملOptical Music Recognition with Convolutional Sequence-to-Sequence Models
Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-en...
متن کاملA Comparison of Sequence-to-Sequence Models for Speech Recognition
In this work, we conduct a detailed evaluation of various allneural, end-to-end trained, sequence-to-sequence models applied to the task of speech recognition. Notably, each of these systems directly predicts graphemes in the written domain, without using an external pronunciation lexicon, or a separate language model. We examine several sequence-to-sequence models including connectionist tempo...
متن کاملSequence-to-Sequence RNNs for Text Summarization
In this work, we cast text summarization as a sequence-to-sequence problem and apply the attentional encoder-decoder RNN that has been shown to be successful for Machine Translation (Bahdanau et al. (2014)). Our experiments show that the proposed architecture significantly outperforms the state-of-the art model of Rush et al. (2015) on the Gigaword dataset without any additional tuning. We also...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-06555-2_18